5,043 research outputs found

    On the Design of LQR Kernels for Efficient Controller Learning

    Full text link
    Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.Comment: 8 pages, 5 figures, to appear in 56th IEEE Conference on Decision and Control (CDC 2017

    Automatic LQR Tuning Based on Gaussian Process Global Optimization

    Full text link
    This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective as a Gaussian process and constructs an explicit belief over the location of the objective minimum. This is used to maximize the information gain from each experimental evaluation. Thus, this framework shall yield improved controllers with fewer evaluations compared to alternative approaches. A seven-degree-of-freedom robot arm balancing an inverted pole is used as the experimental demonstrator. Results of a two- and four-dimensional tuning problems highlight the method's potential for automatic controller tuning on robotic platforms.Comment: 8 pages, 5 figures, to appear in IEEE 2016 International Conference on Robotics and Automation. Video demonstration of the experiments available at https://am.is.tuebingen.mpg.de/publications/marco_icra_201

    Bayesian Optimization in Robot Learning - Automatic Controller Tuning and Sample-Efficient Methods

    Get PDF
    Das Problem des Reglerentwurfs für dynamische Systeme wurde von Ingenieuren in den letzten Jahrtausenden untersucht. Seit diesen Tagen ist suboptimales Verhalten ein unvermeidlicher Nebeneffekt der manuellen Einstellung von Reglerparametern. Heutzutage steht man in industriellen Anwendungen datengestriebenen Methoden, die das automatische Lernen von Reglerparametern ermöglichen, nach wie vor skeptisch gegenüber. Im Bereich der Robotik gewinnt das maschinelle Lernen (ML) immer mehr an Einfluss und ermöglicht einen erhöhten Grad der Autonomie und AnpassungsfĂ€higkeit, z.B. indem es dabei unterstützt, den Prozess der Reglereinstellung zu automatisieren. Datenintensive Methoden, wie z.B. Methoden des Reinforcement Learning, erfordern jedoch eine große Anzahl experimenteller Versuche, was in der Robotik nicht möglich ist, da die Hardware sich abnutzt und kaputt gehen kann. Das wirft folgende Frage auf: Kann die manuelle Reglereinstellung in der Robotik durch den Einsatz dateneffizienter Techniken des maschinellen Lernens ersetzt werden? In dieser Arbeit gehen wir die obige Frage an, indem wir den Einsatz von Bayes’scher Optimierung (BO), ein dateneffizientes ML-Framework, als Ersatz für manuelles Einstellen unter Beibehaltung einer geringen Anzahl von experimentellen Versuchen untersuchen. Der Fokus dieser Arbeit liegt auf Robotersystemen. Dabei prĂ€sentieren wir Demonstrationen mit realen Robotern, sowie fundierte theoretische Ergebnisse zur Steigerung der Dateneffizienz. Im Einzelnen stellen wir vier HauptbeitrĂ€ge vor. ZunĂ€chst betrachten wir die Verwendung von BO als Ersatz für das manuelle Einstellen auf einer Roboterplattform. Zu diesem Zweck parametrisieren wir die Einstellgewichtungen eines linear-quadratischen Reglers (LQR) und lernen diese Parameter mit einem informationseffizienten BO-Algorithmus. Dieser Algorithmus nutzt Gauß-Prozesse (GPs), um das unbekannte Zielfunktion zu modellieren. Das GP-Modell wird vom BO-Algorithmus genutzt, um Reglerparameter vorzuschlagen von denen erwartet wird, dass sie die Informationen über die optimalen Parameter erhöhen, gemessen als eine Zunahme der Entropie. Das resultierende Framework zur automatischen LQR-Einstellung wird auf zwei Roboterplattformen demonstriert: Ein Robterarm, der einen umgekehrten Stab ausbalanciert und ein humanoider Roboter, der Kniebeugen ausführt. In beiden FĂ€llen wird ein vorhandener Regler in einer handvoll Experimenten automatisch verbessert, ohne dass ein Mensch eingreifen muss. vii BO kompensiert Datenknappheit durch den GP, ein probabilistisches Modell, das a priori Annahmen über das unbekannte Zielfunktion enthĂ€lt. Normalerweise haben falsche oder uninformierte Annahmen negative Folgen, wie z.B. eine höhere Anzahl von Roboterexperimenten, ein schlechteres Reglerverhalten oder eine verringerte Dateneffizienz. Die hier vorgestellten BeitrĂ€ge Zwei bis Vier beschĂ€ftigen sich mit diesem Problem. Der zweite Beitrag schlĂ€gt vor, den Robotersimulator als zusĂ€tzliche Informationsquelle für die automatische Reglereinstellung in die Lernschleife miteinzubeziehen. WĂ€hrend reale Roboterexperimente im Allgemeinen hohe Kosten mit sich bringen, sind Simulationen günstiger (sie können z.B. schneller berechnet werden). Da der Simulator aber ein unvollkommenes Modell des Roboters ist, sind seine Informationen einseitig verfĂ€lscht und können negative Auswirkungen auf das Lernverhalten haben. Um dieses Problem anzugehen, schlagen wir “sim-vs-real” vor, einen auf grundlegenden Prinzipien beruhenden BO-Algorithmus, der Daten aus Simulationen und Experimenten nutzt. Der Algorithmus wĂ€gt dabei die günstigen, aber ungenauen Informationen des Simulators gegen die teuren und exakten physikalischen Experimente in einer kostengünstigen Weise ab. Der daraus resultierende Algorithmus wird an einem inversen Pendels auf einem Wagen demonstriert, bei dem sich Simulationen und reale Experimente abwechseln, wodurch viele reale Experimente eingespart werden. Der dritte Beitrag untersucht, wie die Aussagekraft der probabilistischen Annahmen des vorliegenden Regelungsproblem adĂ€quat behandelt werden kann. Zu diesem Zweck wird die mathematische Struktur des LQR-Reglers genutzt und durch die Kernel-Funktion in den GP eingebaut. Insbesondere schlagen wir zwei verschiedene “LQR-Kernel”-Entwürfe vor, die die FlexibilitĂ€t des Bayes’schen, nichtparametrischen Lernens beibehalten. Simulierte Ergebnisse deuten darauf hin, dass die LQR-Kernel bessere Ergebnisse erzielen als uninformierte Kernel, wenn sie zum Lernen von Reglern mit BO verwendet werden. Der vierte Beitrag schließlich befasst sich speziell mit dem Problem, wie ein Versagen des Reglers behandelt werden soll. FehlschlĂ€ge von Reglern sind beim Lernen aus Daten typischerweise unvermeidbar, insbesondere wenn nichtkonservative Lösungen erwartet werden. Obwohl ein Versagen des Reglers im Allgemeinen problematisch ist (z.B. muss der Roboter mit einem Not-Aus angehalten werden), ist es gleichzeitig eine reichhaltige Informationsquelle darüber, was vermieden werden sollte. Wir schlagen “failures-aware excursion search” vor, einen neuen Algorithmus für Bayes’sche Optimierung mit unbekannten BeschrĂ€nkungen, bei dem die Anzahl an Fehlern begrenzt ist. Unsere Ergebnisse in numerischen Vergleichsstudien deuten darauf hin, dass, verglichen mit dem aktuellen Stand der Technik, durch das Zulassen einer begrenzten Anzahl von FehlschlĂ€gen bessere Optima aufgedeckt werden. Der erste Beitrag dieser Dissertation ist unter den ersten die BO an realen Robotern anwenden. Diese Arbeit diente dazu, mehrere Probleme zu identifizieren, wie zum Beispiel den Bedarf nach einer höheren Dateneffizienz, was mehrere neue Forschungsrichtungen aufzeigte, die wir durch verschiedene methodische BeitrĂ€ge addressiert haben. Zusammengefasst haben wir “sim-vs-real”, einen neuen BOAlgorithmus der den Simulator as zusĂ€tzliche Informationsquelle miteinbezieht, einen “LQR-Kernel”-Entwurf, der schneller lernt als Standardkernel und “failures-aware excursion search”, einen neuen BO-Algorithmus für beschrĂ€nkte Black-Box-Optimierungsprobleme, bei denen die Anzahl der Fehler begrenzt ist, vorgeschlagen.In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Eberhard Karls UniversitĂ€t TĂŒbingen’s products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.The problem of designing controllers to regulate dynamical systems has been studied by engineers during the past millennia. Ever since, suboptimal performance lingers in many closed loops as an unavoidable side effect of manually tuning the parameters of the controllers. Nowadays, industrial settings remain skeptic about data-driven methods that allow one to automatically learn controller parameters. In the context of robotics, machine learning (ML) keeps growing its influence on increasing autonomy and adaptability, for example to aid automating controller tuning. However, data-hungry ML methods, such as standard reinforcement learning, require a large number of experimental samples, prohibitive in robotics, as hardware can deteriorate and break. This brings about the following question: Can manual controller tuning, in robotics, be automated by using data-efficient machine learning techniques? In this thesis, we tackle the question above by exploring Bayesian optimization (BO), a data-efficient ML framework, to buffer the human effort and side effects of manual controller tuning, while retaining a low number of experimental samples. We focus this work in the context of robotic systems, providing thorough theoretical results that aim to increase data-efficiency, as well as demonstrations in real robots. Specifically, we present four main contributions. We first consider using BO to replace manual tuning in robotic platforms. To this end, we parametrize the design weights of a linear quadratic regulator (LQR) and learn its parameters using an information-efficient BO algorithm. Such algorithm uses Gaussian processes (GPs) to model the unknown performance objective. The GP model is used by BO to suggest controller parameters that are expected to increment the information about the optimal parameters, measured as a gain in entropy. The resulting “automatic LQR tuning” framework is demonstrated on two robotic platforms: A robot arm balancing an inverted pole and a humanoid robot performing a squatting task. In both cases, an existing controller is automatically improved in a handful of experiments without human intervention. BO compensates for data scarcity by means of the GP, which is a probabilistic model that encodes prior assumptions about the unknown performance objective. Usually, incorrect or non-informed assumptions have negative consequences, such as higher number of robot experiments, poor tuning performance or reduced sample-efficiency. The second to fourth contributions presented herein attempt to alleviate this issue. The second contribution proposes to include the robot simulator into the learning loop as an additional information source for automatic controller tuning. While doing a real robot experiment generally entails high associated costs (e.g., require preparation and take time), simulations are cheaper to obtain (e.g., they can be computed faster). However, because the simulator is an imperfect model of the robot, its information is biased and could have negative repercussions in the learning performance. To address this problem, we propose “simu-vs-real”, a principled multi-fidelity BO algorithm that trades off cheap, but inaccurate information from simulations with expensive and accurate physical experiments in a cost-effective manner. The resulting algorithm is demonstrated on a cart-pole system, where simulations and real experiments are alternated, thus sparing many real evaluations. The third contribution explores how to adequate the expressiveness of the probabilistic prior to the control problem at hand. To this end, the mathematical structure of LQR controllers is leveraged and embedded into the GP, by means of the kernel function. Specifically, we propose two different “LQR kernel” designs that retain the flexibility of Bayesian nonparametric learning. Simulated results indicate that the LQR kernel yields superior performance than non-informed kernel choices when used for controller learning with BO. Finally, the fourth contribution specifically addresses the problem of handling controller failures, which are typically unavoidable in practice while learning from data, specially if non-conservative solutions are expected. Although controller failures are generally problematic (e.g., the robot has to be emergency-stopped), they are also a rich information source about what should be avoided. We propose “failures-aware excursion search”, a novel algorithm for Bayesian optimization under black-box constraints, where failures are limited in number. Our results in numerical benchmarks indicate that by allowing a confined number of failures, better optima are revealed as compared with state-of-the-art methods. The first contribution of this thesis, “automatic LQR tuning”, lies among the first on applying BO to real robots. While it demonstrated automatic controller learning from few experimental samples, it also revealed several important challenges, such as the need of higher sample-efficiency, which opened relevant research directions that we addressed through several methodological contributions. Summarizing, we proposed “simu-vs-real”, a novel BO algorithm that includes the simulator as an additional information source, an “LQR kernel” design that learns faster than standard choices and “failures-aware excursion search”, a new BO algorithm for constrained black-box optimization problems, where the number of failures is limited

    Gait learning for soft microrobots controlled by light fields

    Full text link
    Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing conditions. Albeit, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analytical control design is not possible. Common data-driven approaches, on the other hand, require running prohibitive numbers of experiments and lead to very sample-specific results. Here we propose a probabilistic learning approach for light-controlled soft microrobots based on Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach results in a learning scheme that is data-efficient, enabling gait optimization with a limited experimental budget, and robust against differences among microrobot samples. These features are obtained by designing the learning scheme through the comparison of different GP priors and BO settings on a semi-synthetic data set. The developed learning scheme is validated in microrobot experiments, resulting in a 115% improvement in a microrobot's locomotion performance with an experimental budget of only 20 tests. These encouraging results lead the way toward self-adaptive microrobotic systems based on light-controlled soft microrobots and probabilistic learning control.Comment: 8 pages, 7 figures, to appear in the proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems 201

    Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

    Full text link
    PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems.Comment: Accepted final version to appear in 2017 IEEE International Conference on Robotics and Automation (ICRA

    The CKM parameters in the SMEFT

    Full text link
    The extraction of the Cabibbo-Kobayashi-Maskawa (CKM) matrix from flavour observables can be affected by physics beyond the Standard Model (SM). We provide a general roadmap to take this into account, which we apply to the case of the Standard Model Effective Field Theory (SMEFT). We choose a set of four input observables that determine the four Wolfenstein parameters, and discuss how the effects of dimension-six operators can be included in their definition. We provide numerical values and confidence intervals for the CKM parameters, and compare them with the results of CKM fits obtained in the SM context. Our approach allows one to perform general SMEFT analyses in a consistent fashion, independently of any assumptions about the way new physics affects flavour observables. We discuss a few examples illustrating how our approach can be implemented in practice.Comment: 36 pages. Version published in JHE

    Treescapes

    Get PDF
    We’ve each been looking to the trees for a long time. One of us painting, the other writing, with, by the trees. In the middle of the city and its noise, finding the branches. Standing, inquiring, returning. Why the trees, how we belong to each other, is a question worth asking again and again. These paintings and poems are part of an ongoing conversation, of many layers, of many trees, of what we lose and find under their canopies, in blooms, in dirt & seasons. What walking among the trees has taught us is that every art is an invitation to the mutuality of life. Through paintings it means creating an opening of treescapes and orchards for people to become a part of & inhabit. & every exchange of poetry is a welcoming to community, listening, growth

    On the Impact of Shadowing on the Performance of Cooperative Medium Access Control Protocols

    No full text
    International audienceAccurate representation of the physical layer is instrumental for a sound design and optimization of Medium Access Control (MAC) protocols for cooperative wireless networks. However, the vast majority of MAC protocols are designed and analyzed by considering simplified physical layer and channel models, which often lead to too optimistic performance predictions. In particular, even though many experimental activities have showcased the important role played by shadow-fading, most protocols are designed and evaluated by taking into account only the transmission distance (circular coverage model) or only the fast-fading. Motivated by the proved unsuitability of these models, the contribution of this paper is threefold: i) we provide important considerations on how to adequately include the effect of shadowing into the design of MAC protocols for cooperative networks; ii) we provide an analytical framework to determine the subset of active relays in order to meet a given Quality-of-Service (QoS) requirement; and iii) we study, through analysis and simulation, the performance of a promising MAC protocol for cooperative networks, which is called Persistent Relay Carrier Sensing Multiple Access (PRCSMA), by explicitly taking into account the effect of shadowing. Our study shows that shadowing can dramatically change system and protocol performance
    • 

    corecore